Introduction

Welcome, welcome! This tutorial covers the basics of resting-state fMRI preprocessing, denoising, time series extraction (and parcellation), and functional connectivity analyses.

Before we dive in, let’s have a look at the general workflow.

Resting-state fMRI Workflow
Resting-state fMRI Workflow

Today, we will be going through Steps 1-2 fairly quickly. I left code for you in case you want to run it on your own. Code is written in bash/shell and has to be run in a Terminal, unless stated otherwise.

Step 1: standard fMRI preprocessing

Ensure your dataset is in BIDS format

Link to BIDS validator: https://bids-standard.github.io/bids-validator/

Run fMRIPrep.

For reference, for this to run, make sure you have docker installed. Open your terminal and install the \(fmriprep-docker\) wrapper (copy code below without the ‘#’)

# python -m pip install fmriprep-docker

Now open the \(fmriprep.sh\) script. This is an example of how you run fMRIPrep on one subject.

Check fMRIPrep outputs (QC)

Now that we got a sense of what fMRIPrep is doing, let’s take a look at its output. Open the .html file for subject sub-019.

How to quality control (QC) an fMRIPrep HTML Report

Usually fMRIPrep is the first step of a standard fMRI preprocessing pipeline, so QC at this stage is pretty minimal, as further cleaning will be done on the data. Note: fMRIPrep outputs many confound files (e.g., motion artifacts), but does not remove these sources of noise from the data. It is up to you to decide whether and how you want to use them.

Because further denoising will be applied, here I usually look at major issues with alignment/coregistration and spend less (to no) time looking at confounds (but this is my personal preference).

For more details on QCing fMRIprep outputs: https://andysbrainbook.readthedocs.io/en/latest/OpenScience/OS/fMRIPrep_Demo_3_ExaminingPreprocData.html

My go to checklist:

Section What to Check
Overall Summary - Sub ID correct? Freesurfer? All spaces there?
Anatomical Summary - Conformation: one T1w image? Dimensions consistent across subs? No volumes discarded?
- Brain mask and tissue segmentation: T1w and segmentation look okay? No distortion/cutoff?
segmentation fits gray matter (red line) and white matter (white area surrounded by blue line)
- Normalisation: alignment okay (grey matter, white matter, ventricles)
Functional Summary - Check that the report is accurate (TR, sequence, slice timing/susceptibility correction, non-steady volumes)
- Alignment func/struct: do the two (anatomical is skull stripped vs functional is not) correspond okay?
- Check alignment based on ventricles, cingulate cortex, cerebellum
- Brain mask and CompCor ROIs: main thing is red line includes whole BOLD image (okay if a bit cutoff)
not okay if red line excludes cerebellum, brainstem and other cortical regions
- BOLD Summary: okay if movement still there (will be dealt with later). Note if DVARS max > 4 and FD max > 1.
- Carpet plot may still have some spikes/motion. If noise is regular, take note.
Overall errors - Critical errors or warnings?

Usually, this is the file that is used in step 2: desc-preproc_bold.nii.gz. This file has been through \(motion correction\) (!= motion regression), \(slice timing correction\), \(distortion correction\), \(coregistration to brain structure\), \(spatial normalisation\), \(skull stripping/brain mask extraction\).

Note: It is at this stage that people usually compute some QC metrics like FD, DVARS, tSNR to quantify (and not just visually inspect) the quality of their data. FD and DVARS are already computed by fmriprep (check the desc-confounds_timeseries.tsv file). tSNR is not computed by default, but you can easily obtain it yourself by using the desc-preproc_bold.nii.gz file and applying this formula per voxel/vertex: mean signal over time / standard deviation over time.

# #Example code to obtain tSNR map on your data: not running it today!
# mean_img <- apply(bold_img, c(1, 2, 3), mean) #bold_img is a 4D array [x, y, z, time] for one subject
# sd_img <- apply(bold_img, c(1, 2, 3), sd)
# tsnr_img <- mean_img / sd_img #tSNR formula

You want your data to have low FD, low DVARS (aka low motion), and high tSNR. Usually, cutoffs for exclusion are: FD > 0.5mm; DVARS > 2, tSNR < 50. These are not hard cutoffs and they are mostly applied to 3T data.

Step 2: resting-state fMRI denoising

fMRIprep does some basic level preprocessing, which is great but it still leaves lots of stuff we do not want in our data (i.e., noise that is not related to anything biologically interesting). Therefore, extra cleaning is necessary, this step is called \(denoising\). There are endless options you can pick to denoise your data. As such, there is no gold standard for denoising in the field. You can pick your own flavour based on your question and your data.

Assume you just ran fMRIprep: you now have (for each individual) a .nii file that has been through spatial realignment, motion correction, slice timing correction, distortion correction, and potentially has been normalised to a group template. Here are some next steps that can be applied on the data:

Step Explanation
Motion regression Regressing out motion parameters (not dealt with in the realignment/correction step): 6,12,24 etc
Nuisance regression Regressing out physiological noise (white matter, CSF)
Spike regression Extra to remove abrupt motion (debatable)
ICA-based regression Extra cleaning
Global signal regression Removing global brain signal (debatable) to further control for motion/respiration etc
Band-pass filtering Typically 0.01-0.08 Hz to isolate neuronal frequencies
Detrending/time-series normalisation Reducing scanner-induced drift and enabling comparison amongst voxel/vertices/regions

Note: the data we are going to use today in this tutorial have not been through this extensive denoising pipeline.

Our data come from an openly available dataset (HCP 3T young adults resting-state dataset https://www.humanconnectome.org/study/hcp-young-adult/data-releases). We will be doing our analyses on a small group of individuals (n=5). Data are in volume space.

Step 3: resting-state fMRI time-series extraction

Our minimally preprocessed data are in voxel space. Typically though, resting-state analyses are carried out at the level of parcels or nodes (equivalent terms for brain regions). Parcels/nodes/regions are obtained from averaging contiguous voxels together. This process of converting brain images from voxel space to parcel/node/region space is called \(parcelation\). Parcellations are ways to extract mean signal from each parcel (= \(time-series extraction\)).

There are many parcelations out there. The most popular one in functional neuroimaging at the moment is the Schaefer parcelation https://github.com/ThomasYeoLab/CBIG/tree/master/stable_projects/brain_parcellation/Schaefer2018_LocalGlobal. There are different flavours of Schaefer: different numbers of regions and networks. Today, we are going to be using Schaefer 200 region-7 network.

# Let's take a look at the Schaefer 200-7 parcelation
# Load network file
palette <- read.csv("../atlas/Palette.csv")  # this file has: Region, Network, Hex_code

# Extract atlas data frame
schaefer_df <- as.data.frame(schaefer7_200)

# Join with palette to add network info
schaefer_annotated <- left_join(schaefer_df, palette, by = c("region" = "Region"))

# Define desired order of networks from the palette
network_order <- unique(palette$Network)  # otherwise it orders everything alphabetically
palette$Network <- factor(palette$Network, levels = network_order)
schaefer_annotated$Network <- factor(schaefer_annotated$Network, levels = network_order)

# Create named color vector from Network names and their hex codes
network_colors <- setNames(palette$Hex_code, palette$Network)

# Plot by network
ggplot() +
  geom_brain(atlas = schaefer_annotated,
             mapping = aes(fill = Network, geometry = geometry),
             position = position_brain(hemi ~ side)) +
  scale_fill_manual(values = network_colors,
                    na.translate = FALSE) +
  theme_void() +
  labs(title = "Schaefer 200-7 Parcellation by Network",
       fill = "Network")

Awesome! Now let’s apply this parcelation to our data and extract time-series :)

#I did this already to speed up the process. I did so using FSL based commands. Here is an example for one subject, assuming your data and your atlas are in the same folder:
# fslmeants -i sub-101309_rfMRI_REST1_LR.nii -o sub-101309_Schaefer2007.txt --label=Schaefer2018_200Parcels_7Networks_order_FSLMNI152_2mm.nii

Let’s check if it all worked okay. Let’s plot all timeseries for one subject.

# Let's load in one subject's parcelated data
df <- read.table("../data/sub-101309_Schaefer2007.txt")  

# Add a Time column so we can create a nice labelled plot
df$Time <- 1:nrow(df)

# Reshape data frame to long format
df_long <- pivot_longer(df, 
                        cols = -Time, 
                        names_to = "Region", 
                        values_to = "Signal")
# Plot
ggplot(df_long, aes(x = Time, y = Signal, color = Region)) +
  geom_line(alpha = 0.6) +
  theme_minimal() +
  labs(title = "ROI Time Series", x = "Time (TRs)", y = "fMRI BOLD Signal") +
  theme(legend.position = "none") 

What do you notice from this plot? What preprocessing/denoising step do you think is missing? There are obviously a couple that are noticeable (since these data went through a minimally preprocessed/denoised pipeline). However, there is one that is crucial to ensure that we can safely compare signals across regions.

# Let's fix this
df_z <- scale(df, center = TRUE, scale = TRUE)
df_z <- as.data.frame(df_z)

# Add a Time column so we can create a nice labelled plot
df_z$Time <- 1:nrow(df_z)

# Reshape data frame to long format
df_z_long <- pivot_longer(df_z, 
                        cols = -Time, 
                        names_to = "Region", 
                        values_to = "Signal")
# Plot
ggplot(df_z_long, aes(x = Time, y = Signal, color = Region)) +
  geom_line(alpha = 0.6) +
  theme_minimal() +
  labs(title = "ROI Time Series", x = "Time (TRs)", y = "fMRI BOLD Signal") +
  theme(legend.position = "none")

Phew, fixed! I already applied this step to all subjects’ files. These data are located in the /data/ folder (files begin with “z_”). Let’s take a look:

Any comments on these plots? What do you notice?

Step 4: resting-state fMRI analyses

Let’s do some analyses on these data now! First thing, let’s calculate functional connectivity (FC). As a refresher, FC is a statistical construct derived (typically) as the Pearson’s correlation between pairs of brain regions. The output is therefore a region x region matrix where each entry indicates how strong the correlation is between two regions. In our case, our FC matrix will be 200x200.

rm(list=ls())

# Load in your data
data_dir <- "/Users/gbar0888/Desktop/Teaching/fMRI_Preprocessing_Connectivity_rest/Practice/data" #change this to where your data are
files <- list.files(data_dir, pattern = "^z.*\\.txt$", full.names = TRUE) #all our normalised files

# Since we have multiple subjects, we are going to calculate FC and store these matrices in a list
fc_matrices <- lapply(files, function(file) {
  ts <- as.matrix(read.table(file)) #convert to matrix format to do calculations
  
  # Pearson's correlation
  cor_mat <- cor(ts)
  
  return(cor_mat)
})

# let's look at the distribution of these FC values: let's look at subject 4
hist(fc_matrices[[4]]) 

# For group analyses, we need to normalise these FC values: Fisher-z transformation
z_matrices <- lapply(fc_matrices, function(correlation) {
  z_mat <- atanh(correlation)
  
  # Clean matrix: replace Inf values that come from Fisher z-transform
  z_mat[!is.finite(z_mat)] <- NA
  
  return(z_mat)
})

hist(z_matrices[[4]]) #let's check if normalisation worked

# Let's average all FC matrices across our subjects and then visualise it
group_fc <- Reduce("+", z_matrices) / length(z_matrices)
group_fc <- as.data.frame(group_fc)
# Load network labels from atlas
atlas <- read.csv("../atlas/Palette.csv")

colnames(group_fc) <- atlas$Region
rownames(group_fc) <- atlas$Region

networks <- atlas$Network
networks <- factor(as.character(networks), levels = unique(networks))

reds <- colorRampPalette(c("#fee5d9", "#fcae91", "#fb6a4a", "#de2d26", "#a50f15"))(100)

superheat(group_fc,
          #heat.lim = c(0.2, 0.8),
          #extreme.values.na = FALSE,
          membership.rows = networks,
          membership.cols = networks,
          left.label.size = 0.4,
          bottom.label.size = 0,
          scale = FALSE,
          heat.pal = reds,
          # make the legend bigger
          legend.height = 0.25,
          legend.width = 2,
          legend.text.size = 10)

It is a bit hard to see what is going on here, right? You can play with the value limits, but in generally you want to make sure you see boxes in your matrix reflecting the brain’s functional network organisation.

Graph Theory application

Let’s take this a step further. Let’s calculate how much each region is connected to the rest of the brain, a measure that is called node strength. Node strength (or region strength) is “the sum of weights of links connected to the node” (from https://sites.google.com/site/bctnet/list-of-measures). This will allow us to obtain a 1x200 vector that we can relate to a bunch of other measures of brain organisation.

node_strength <- rowSums(group_fc, na.rm = TRUE)

node_strength <- as.data.frame(node_strength)
rownames(node_strength) <- seq(1:200)

region <- atlas$Region

node_strength[,2] <- region

colnames(node_strength)[1] <- "Value"
colnames(node_strength)[2] <- "region"

node_strength %>%
  ggplot() +
  geom_brain(atlas = schaefer7_200,
             position = position_brain(hemi ~ side),
             mapping = aes(fill=Value, geometry=geometry)) +
  scale_fill_viridis_c(limits = c(80, 110), oob = scales::squish)
## merging atlas and data by 'region'

#write.csv(node_strength, "../data/Group_nodalstrength_2007Schaefer.csv") #write output to csv file

Other applications

Now it is your turn! Open your RStudio and open this file: Tutorial_fMRI_forstudents.R in the /code folder.